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ABSTRACT 

The maximum-entropy method is applied to the problem of reconstructing the pro- 
jected mass density of a galaxy cluster using its gravitational lensing effects on back- 
ground galaxies. We demonstrate the method by reconstructing the mass distribution 
in a model cluster using simulated shear and magnification data to which Gaussian 
noise is added. The mass distribution is reconstructed directly and the inversion is 
regularised using the entropic prior for this positive additive distribution. For realistic 
noise levels, we find that the method faithfully reproduces the main features of the 
cluster mass distribution not only within the observed field but also slightly beyond it . 
We estimate the uncertainties on the reconstruction by calculating an analytic approx- 
imation to the covariance matrix of the reconstruction values of each pixel. This result 
is compared with error estimates derived from Monte-Carlo simulations for different 
noise realisations and found to be in good agreement. 

Key words: cosmology - dark matter - gravitational lensing - galaxies: clusters 



1 INTRODUCTION 



Galaxies and galaxy clusters can cause the images of more 
distant galaxies to be distorted and magnified due to grav- 
itational lensing effects (see e.g. Schneider, Ehlers & Falco 
1992 or Blandford & Narayan 1992 for reviews). This natu- 
rally leads one to consider how we may best use these effects 
to reconstruct the mass distribution of the lensing cluster, 
particularly since such a reconstruction is sensitive both to 
luminous and dark matter. 

Typically the distances between the background objects 
and the lens, and between the lens and the observer, are 
much larger than the size of the lens itself so that the mass 
distribution of the lens can be considered as a mass sheet 
lying the lens plane. Thus a lens is fully characterised by its 
surface mass density £(#), which is a function of angular 
position on the sky. 

If at any point in the lens plane the surface mass den- 
sity exceeds some critical value E cr i t , which depends on the 
lensing geometry (see Section ^), then the lens is said to be 
supercritical and will exhibit non-linear 'strong' lensing ef- 
fects. This condition is often satisfied by very dense clusters 
and numerous examples of strong lensing have been observed 
(see e.g. Fort & Mellier 1994). Depending on the position of 
a background galaxy with respect to the caustics of the lens, 
its image can be enormously magnified and distorted so that 
it appears as a giant arc or is multiply imaged. Such effects 
provide powerful constriants on the projected mass in the 
lens that is contained within the arc. 

In most cases, however, the gravitational lensing effects 



are more subtle and typically the lensing cluster produces a 
large number of weakly distorted images of background ob- 
jects, which are called arclets. The surface density of galaxies 
to an R— band magnitude of R ~ 23 mag is on the order of 
~ 10 per square arcmin, rising to ~ 100 per square arcmin 
for galaxies to R ~ 25 mag (e.g. Woods, Fahlman & Richer 
1995). We therefore expect the typical separation of arclets 
to be about 15 arcsec for observations out to R ~ 23 mag, 
falling significantly to about 5 arcsec for R ~ 25 mag. Since 
we do not expect the gravitational potential of a cluster 
to change appreciably over 5-10 arcsec scales, it is possi- 
ble to average the signals from several arclets at a time to 
produce a coherent pattern of distortion. This averaging is 
usually performed by dividing the image into square cells of 
size 30-60 arcsec, which will contain the images of ~ 10- 
50 background galaxies. The pattern of distortion, or shear 
pattern, produced by the lensing cluster is then measured 
by averaging the ellipticities and orientations of the arclets 
in each cell; this is discussed further in Section ^. 

The resulting shear pattern may then be used to esti- 
mate the surface mass density E(0) of the lensing cluster. 
This procedure was first investigated by Tyson, Valdes & 
Wenk (1990) and parameterised fits for cluster shapes were 
performed by Kochanek (1990) and Miralda-Escude (1991). 
The first parameter-free method was presented by Kaiser & 
Squires (1993) in which a Green's function technique is used 
to reconstruct a two-dimensional map of E(0) on the same 
grid of points as that used for averaging ellipticities. 

Although in its original form, the Kaiser & Squires 
method applies only in the weak lensing limit, it was ex- 
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tended to the non-linear regime by Kaiser (1995). However, 
the method still has a number of limitations, many of which 
are shared by other methods. These problems include the 
'mass-sheet degeneracy' (Falco, Gorenstein & Shapiro 1985; 
Schneider & Seitz 1995), which affects any algorithm that 
uses shear information alone to determine the lens mass 
distribution; this can only be broken by also measuring 
the magnification of background galaxies. Another problem 
faced by all reconstruction methods is a scaling uncertainty 
in the reconstructed surface mass density if the redshifts 
of the lensed background objects are not known, although 
some methods for circumventing this problem have been pro- 
posed (e.g. Kneib et al. 1994; Smail, Ellis & Fitchett 1994; 
Bartelmann & Narayan 1995) . Finally, perhaps the most im- 
portant drawback of the Kaiser & Squires method is that it 
requires a convolution of shears to be performed over the en- 
tire sky. As a result, if the field of observed shears is small or 
irregularly-shaped, then the method can produce artefacts 
in the reconstructed E(0) distribution near the boundaries 
of the observed field. Nevertheless, several extensions of the 
basic algorithm have been developed by Kaiser, Squires & 
Broadhurst (1995); Bartelmann (1995) and Seitz & Schnei- 
der (1996). These 'finite-field methods' are based on a line- 
integral approach which reduces the unwanted boundary ef- 
fects. 

More recently, Bartelmann et al. (1996) have presented 
a novel maximum-likelihood approach to reconstructing the 
cluster mass distribution that avoids boundary effects and 
uses both shear and magnification data, thereby breaking 
the mass-sheet degeneracy. The method is based on finding 
the Newtonian gravitational potential that best reproduces 
the observed data in a straightforward least-squares sense. 
This best-fit gravitational potential is then used to obtain 
the mass distribution. The advantage of this method is that, 
by design, it can be applied directly to clusters where the 
observed field is small or irregularly shaped. In addition, 
it is straightforward to incorporate measurement inaccura- 
cies and correlations within the data. Seitz, Schneider & 
Bartelmann (1998) extend this approach to include an en- 
tropy regularisation of the potential and use the individual 
galaxy ellipticities and positions instead of averaging by di- 
viding the image into cells. Squires & Kaiser (1996) suggest 
several reconstruction techniques including two maximum- 
likelihood methods. The most successful of these is regu- 
larised in Fourier space using the prior that the Fourier co- 
efficients of the mass density distribution are drawn from a 
Gaussian distribution of some constant width. 

In this paper we consider maximum likelihood as a spe- 
cial case of the maximum-entropy method (MEM) in the 
context of Bayes' theorem (see Section Q). We use the MEM 
technique to reconstruct the cluster mass distribution from 
a grid of simulated shear and magnification data to which 
random Gaussian noise is added, although the basic method 
we describe could also be extended to use individual galaxy 
ellipticities and positions. We use the regularisation proper- 
ties of MEM to reconstruct the mass distribution directly 
rather than, for example, reconstructing the gravitational 
potential as an intermediate step (Bartelmann et al. 1996, 
Seitz, Schneider & Bartelmann 1998), or assigning a prior to 
the Fourier coefficients of the mass distribution (Squires & 
Kaiser 1996). This is clearly valuable since the mass distri- 
bution is the quantity in which we are most interested and, 
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Figure 1. The gravitational lens geometry. The light ray propa- 
gates from the source S to the observer O and is deflected through 
as angle a so that the image appears at /. The angular separa- 
tions of the source and image from the optic axis are denoted 
by P and 9 respectively. D<j, D s and D^ s are respectively the 
angular-diameter distances from the observer to the lens, from 
the observer to the source, and from the source to the lens. 

in addition, it allows a more straightforward evaluation of 
the uncertainties in the reconstruction. Moreover, since the 
mass distribution is both positive and additive, it is natu- 
ral to assign an entropic prior. Without wanting to enter 
into the complex debate on this issue, there are strong ar- 
guments which suggest that use of the entropy of the image 
as the regularising function is the only consistent way to as- 
sign any positive additive distribution (see Gull 1989 for a 
review). As a final point, we note that in a given observed 
field, the distortions of the images of background galaxies 
are not only influenced by the mass distribution inside the 
observed field but also by mass lying outside the field. It 
is therefore necessary to reconstruct the mass distribution 
some distance beyond the boundary of the observed field, al- 
though the uncertainty in the reconstruction in this region 
will clearly increase rapidly with distance since the lensing 
effect is quite localised. As illustrated in Section [| however, 
the MEM reconstruction of the mass distribution remains 
faithful even slightly outside the observed region. In regions 
far away the observed area, the entropic prior ensures the re- 
construction defaults to the assumed model value wherever 
there is insufficient evidence in the data to the contrary. 



2 GRAVITATIONAL LENSING BY A GALAXY 
CLUSTER 

We begin by considering how the observed distortion and 
magnification of the background galaxies depend on the 
mass distribution of the lensing cluster. For a detailed ac- 
count of gravitational lensing by clusters see e.g. Schneider, 
Ehlers & Falco (1992) or Blandford & Narayan (1992); here 
we summarise the points relevant to our current work. 

The geometry assumed in this paper is illustrated in 
Fig. |l| Using Cartesian coordinates, the local properties of 
the lens mapping between source plane vectors (3 — (/3i, fh) 
and image plane vectors = (81,82) are given by the linear 
transformation 

dp = Ad0, (1) 
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where the Jacobian matrix A is the inverse of the magnifi- 
cation tensor and is given by 



A(0) 



1 — K — 71 
—72 



-72 
1 — K + 71 



(2) 



The convergence k and the shears 71 and 72 are explained 
below. Note that in general the Jacobian matrix is a function 
of position 0. 

The convergence, k, is obtained by rescaling the pro- 
jected mass per unit area in the lens, E, and is defined by 



K{0) 



E(0) 



where the critical surface density £ C rit is given by 



AtyG D d D ds 



(3) 



(4) 



Here D d , D a and D dB are respectively the angular-diameter 
distances from the observer to the lens, from the observer to 
the source, and from the source to the lens; these distances 
are shown in Fig. |lj Convergence acting alone produces an 
isotropic magnification of background galaxies. 

The quantities 71 and 72 describe the anisotropic dis- 
tortion of background galaxy images. 71 describes the shear 
in the x and y directions and 72 describes the shear in the 
x = y and x = — y directions. For algebraic simplicity they 
are often combined into the complex shear 



7 (0) = 71W +172(0). 



(5) 



The modulus I7I describes the magnitude of the shear and 
its argument, <j>, describes the orientation. In the presence of 
both convergence and shear, a circular source of unit radius 
becomes an elliptical image with major and minor axes of 
lengths 



l7l 



b=(l-K+\ 7 \y\ 



and the total magnification is given by 
1 1 



detA 



(1- K )2_| 7 |: 



(6) 



(7) 



The shear 7 and the convergence k are related by 
the scaled two-dimensional gravitational potential ip (see 
Schneider, Ehlers & Falco 1992). Both 7 and k can be writ- 
ten as linear combinations of this potential and are given 
by 



71(0) = |(^ll(0)-^,22(0)) : 
72(0) = ^,12(0), 

k(8) = i(VMi(0)+VA2 2 (0)). 



(8) 
(9) 
(10) 



where the commas and subscripts on ip denote partial dif- 
ferentiation with respect to the image plane vectors 9i, for 
example, 7/1,12 = d 2 ip/88 1882- Indeed the relationships (|^)- 
( [i"o| ) form the basis of the maximum-likelihood reconstruc- 
tion method presented by Bartelmann et al. (1996) in which 
the observed distortion and magnification of background 
galaxies are used to reconstruct the gravitational potential 
of the lensing cluster. The corresponding mass distribution 
is then found using (|lo|), (|| and ([|). 



3 SIMULATING OBSERVATIONS OF 
GRAVITATIONAL LENSING 

In the approach presented below we use shear and magnifi- 
cation data to reconstruct the convergence k(0), which can 
then be converted into the surface mass density E(0) using 
(^). In order to achieve this using our method, it is first nec- 
essary to consider the 'forward' problem of predicting the 
shear and magnification pattern produced by a given con- 
vergence distribution. 

Fourier transforming equations (p|)-([i"o|), we find 

-§(fci-fcf)^(fe), 

— felfe2*0(fe) 



71 (fc) = 

72 (As) = 
k{k) = 



•i (ki + kl) i>(k) 



(ii) 

(12) 
(13) 



where k = (kx,k-i) is the position vector in Fourier space. 
Eliminating ip from these equations we obtain 



7i W = 



72 (fc) = 



|fe| 2 
2feife 2 - 



-k(k) 



Ifcl 2 



k(k). 



(14) 
(15) 



Inverse Fourier transforming and combining the shear com- 
ponents 71 and 72 into the complex shear 7, we find 



~ J 2?(0-0X0')d 2 0', 



7(0) = - 



where the kernel is given by 



V{8) = 



02 - 0i - 2igi6>2 
I0I 4 ' 



(16) 



(17) 



Thus, equations (^) and (|rif ) express the shear pattern as 
a convolution in terms of the convergence k. For a given 
k distribution it is then also straightforward to obtain the 
resulting magnification pattern /i(0) using (Q). 

As mentioned in Section [l], lensing data typically consist 
of an observed field made up of a grid of cells in each of which 
the average distortion and magnification of the background 
sources have been calculated. In each cell the average distor- 
tion can be quantified by measuring the mean ellipticity of 
the lensed images in that cell, whereas the magnification can 
be found either by comparing galaxy counts in the cluster 
field and in an unlensed field (Broadhurst, Taylor & Peacock 
1995; Seitz & Schneider 1997), or by comparing the sizes of 
similiar galaxies in two such fields (Bartelmann & Narayan 
1995). 

To produce realistic simulated observations of the lens- 
ing induced by a given model cluster, it is usual to assume 
some population of background galaxies (see Bartelmann et 
al. 1996) and compute numerically the resulting arclets after 
lensing by the model cluster. These arclets are then analysed 
to produce a grid of averaged ellipticites and magnifications, 
as discussed above. This technique also allows an estimate 
to be made of the uncertainty in the determination of the 
averaged quantities in each cell. 

In order to assess the capabilities of the MEM recon- 
struction technique, we adopt an equivalent approach in 
our simulations, although in a slightly more straightforward 
manner. To simulate an observation, we begin by assuming 
a convergence distribution n{0) for our model cluster. The 
model cluster used is shown in Fig. U and consists simply of 
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Figure 2. The true convergence distribution k(6) = £(0)/£ cr it 
of the model cluster. The dotted line indicates the edge of the 
observed field. 



two isothermal spheres projected onto the lens plane. The 
cluster is simulated on a 16 x 16 grid with a cell size of 1 
arcmin and is everywhere subcritical. We then assume that 
the observed field consists only of the 8x8 grid cells at the 
centre of the 16 x 16 grid; this is delineated by the dashed 
box in Fig. ^ 

In each cell of the observed field the average elliptic- 
ities and magnifications, in the absence of noise, are then 
calculated as follows. The complex ellipticity of an image is 
defined as 

a — b . . . 
e = £i + i£2 = — — r exp{i<p). 
a + b 

where a and b are respectively the lengths of the major and 
minor axes of the elliptical image and <f) is its position angle 
with respect to the £-axis. If we assume that the intrinsic 
ellipticities of the background sources within a cell average 
to zero, then using ^ the predicted average ellipticities tf^ 
(i = 1,2) in the cell due to lensing by the cluster are given 
simply by the mean values of 7i/(l— «) over the cell, where 7i 
is calculated using (^) . The predicted average magnification 



,(p) w , _ „ ^ , 

in practice it is more convenient to work in terms of its 
inverse r^ p \ which from (Q) is given simply by the average 
of (1 - «) 2 — I7I 2 over the cell. 

Finally, random Gaussian noise is added to the pre- 
dicted average ellipticities and inverse magnifications in each 
cell. For the ellipticities, the standard deviation of the noise 
on each grid cell is assumed to be cr e = 0.05 and for 
the inverse magnifications we assume oy = 0.1. The value 
cr e = 0.05 is consistent with that expected for 20 galaxies 
per grid cell (Schneider & Seitz 1995). If found from galaxy 
counts, the estimate of the magnification will be subject to 
Poisson noise, and for 30 galaxies per cell the error will be 
of the order of 10 per cent, and the noise on the inverse 
magnification will also be of the same size. Thus our as- 
sumption of an error oy =0.1 on the inverse magnification 
is not unreasonable. 

Thus the basic data in our simulated lensing observa- 
tion consist of, in each grid cell, the 'observed' average el- 
lipticities e^ ' and and inverse magnification r^°\ each 
of which contains a noise contribution. As an illustration of 



the simulated lensing data, in Fig. g we plot the average 
ellipicity and inverse magnification produced by our model 
cluster along the x and y-axes in Fig. ^[ The solid line rep- 
resents the value of each quantity in the absence of noise, 
the points denote the observed values for our given noise 
realisation and the error bars indicate the rms noise level 
assumed. 



4 THE MAXIMUM-ENTROPY METHOD 



'ob- 



(o) >> and r<°> 



As explained in the previous section, the simulated 
served' data consist of the three quantities £j°' , e. 
in each grid cell of the observed field. Let us denote these 
observed data by the vector 



= (4 0) (i), 



.(°) 



(l).rW(l) 



» 



(AO), 



where e{ (J) and S± (J) denote the observed components of 
the average ellipticity in the jth grid cell and (j) denotes 
the corresponding observed inverse magnification. The total 
number of observed grid cells is N — 8 x 8 = 64 in our sim- 
ulation and thus the vector d has 3N = 192 components. 

From these data, we wish to reconstruct the cluster 
mass distribution (defined in terms of the convergence ft) 
on some grid of points. For simplicity we use the same grid 
for the reconstruction as that on which the cluster mass dis- 
tribution was originally defined (see Fig. ^ , although this is 
not required by the algorithm. Hence we aim to reconstruct 
the convergence digitised on to 16 x 16 cells and we denote 
this distribution by the vector 

K= (ft(l),ft(2),...,ft(L)), 

where is the convergence in the jth cell and L is the 
total number of pixels in the reconstruction grid (i.e. 16 x 16). 
Since the observed field consists only of the central 8x8 grid 
cells, we are therefore attempting to reconstruct the cluster 
mass distribution some way outside the region in which data 
are available. 

We choose our estimator k of the convergence distribu- 
tion to be that which maximises the posterior probability 
Pr(/«jd ). Using Bayes' theorem, this is given by 



Pr(«|d ) = 



Pr(d |/c) Pr(<c) 
Pr(d ) 



(18) 



where Pr(d |/«) is the likelihood of obtaining the data given a 
particular convergence distribution and the evidence Pr(d ) 
is simply a constant that ensures the posterior probability 
is correctly normalised. The prior probability Pr(/«) codifies 
our expectations about the convergence distribution before 
acquiring the data d . Since the evidence is merely normali- 
sation factor, we actually need only to maximise the product 
Pr(d |i«) Pr(«). 

Let us first consider the form of the likelihood Pr(d |n). 
As discussed above, given a convergence distribution k we 
can use equations (jl6|) and (^) to predict the values of the 
noiseless average ellipticities and inverse magnification in 
each observed grid cell. We denote these predicted values in 
the jth grid cell as tf\j), e^C?) an d r ^ P \j) an( i assemble 
the values for all the cells into the predicted data vector 



(l),4 p) (l),r 



(p) 



(1),- 



(A0,4 P) (A0, 



,(p) 



(AO)- 



Therefore, we may consider the observed data vector as 
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Figure 3. Slices along the x and y-axes for each of the observed quantities, ei, t2 and r. The continuous line is the quantity calculated 
directly from the mass distribution, with no noise added. The points are observed values of each quantity for a particular noise realisation. 
The error bars indicate the noise levels used. 



d = dp + n, 

where the vector n contains the noise contribution in each 
observed grid cell for our particular realisation. 

If the noise on the observed data values is Gaussian- 
distributed then the likelihood function is the 3N- 
dimensional multivariate Gaussian 

Pr(d |/t) oc exp [-|(do - d p ) T N" 1 (do - d p )] , 

where N = (nn T ) is the ensemble average noise covariance 
matrix. If there are any correlations between the different 
measured quantities, either within a cell or between cells, 
then they can be straightforwardly incorporated into the 
noise covariance matrix. For example, Bartelmann et al. 
(1996) point out that for faint lensed images it is possible 
that the measured values of the ellipticities and magnifica- 
tion can be correlated. In most cases, however, it is assumed 
that the data values are all independent, so the noise covari- 
ance matrix becomes diagonal and is given by 



N = diag(o-d { 



2 \ 
. °"d(3iV)J 



(19) 



where c^y) is the variance of the noise in the jth data value. 
Then the likelihood becomes 

Pr(d c |#c) oc exp (-|x 2 ) > 



where x 2 is the standard misfit statistic given by 
x 2 = [do(j) -dpU)) 2 



3=1 



(20) 



'd(j) 



Let us now turn our attention to the prior probabil- 
ity Pr(«). The simplest possible prior is the uniform prior, 



which assumes that in each grid cell, before acquiring any 
data, all values of the convergence are equally likely. In this 
case the posterior probability is directly proportional to the 
likelihood and so by maximising the posterior with respect 
to k, we actually obtain the maximum- likelihood estimator 
for the convergence distribution. However, since the conver- 
gence is a positive additive distribution, it may be shown 
(Skilling 1989), using very general notions of subset invari- 
ance, coordinate invariance and system independence, that 
the prior probability assigned to the components of the vec- 
tor k should take the form 



Pr(/*) oc exp[aS'(K, m)], 



(21) 



where the dimensional constant a depends on the scaling 
of the problem and may be considered as a regularising pa- 
rameter, and m is a model vector to which k defaults in 
the absence of any evidence in the data to the contrary. The 
function S(n, m) is the cross-entropy of n and m and is given 
by 



S(k, m) = 2J - m(j) - n(j) In 



2=1 



m(j) 



(22) 



which has a global maximum at k = m. If we have some 
prior knowledge of the structure of the lensing cluster, we 
may include this information in the model m. Otherwise, we 
set the model to have the same value in each pixel. This 
value is generally set to be somewhat smaller than the level 
expected for the reconstructed convergence distribution, but 
in regions where the convergence is well-constrained by the 
data, the level of the model makes no difference to the final 
reconstruction (see Sections). 
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Combining the expression (jl9|) and ( |2~tj ) for the likeli- 
hood and prior respectively, we see that the posterior prob- 
ability distribution is given by 

Pr(«|d ) oc exp(-i X 2 + aS). (23) 

Therefore, maximising this distribution with respect to k is 
equivalent to minimising the function 

F — \x ~ aS. (24) 

From the expressions ( pp[ ) and (53), it is possible to 
calculate analytically the first and second partial deriva- 
tives of F with respect to (j = 1,2,..., L). In order 
to minimise the function F we use a conjugate gradient al- 
gorithm in which the analytical derivative calculations are 
performed using Fast Fourier Transforms. Specifically, we 
use the Polak-Ribiere variant of Fletcher-Reeves conjugate 
gradient method (Press et al. 1992), with modifications to 
the implementation as suggested by Mackay (1996). In fact, 
since the convergence distribution is always positive it is 
convenient to minimise with respect to log n(j) for which 
the corresponding derivatives of F are also easily found. 

Finally, we must consider the value of the regularising 
parameter a. In early implementations of MEM, a was cho- 
sen so that for the final reconstruction the misfit statistic \ 2 
equalled its expectation value, i.e. the number of data values 
3N. This choice is usually refered to as historic MEM. It is, 
however, possible to determine the appropriate value for a 
in a fully Bayesian manner (Skilling 1989; Gull & Skilling 
1990) by simply treating it as another parameter in our hy- 
pothesis space. It may be shown that a must satisfy 

— 2aS(k, m) — L — aTr(M _1 ), (25) 

where k is the convergence distribution that minimises the 
function F for this value of a and L is the total number of 
pixels in the reconstruction. The L x L matrix M is given 
by 

M = G 1/2 HG" 1/2 , 

where H = X7 K X7 K F is the Hessian matrix of the function 
F evaluated at the point k and G = — V^V^S is minus the 
Hessian matrix of the entropy function at k. As mentioned 
above, these matrices can be calculated analytically. 

It may also be shown, however, that the Bayesian value 
for a may be reasonably well approximated by simply choos- 
ing its value such that the value of F at its minimum is 
equal to the half number of data points, i.e. F(k) = 3N/2 
(see Mackay 1996). We have determined a by both methods 
and found them to agree to within a few percent. More im- 
portantly, the resulting reconstructions of the convergence 
distributions for each case are indistinguishable. 

5 ESTIMATING THE RECONSTRUCTION 
ERRORS 

It is essential to be able to estimate the errors in the re- 
construction. From (j2^) and (^4j), we see that the posterior 
probability distribution may be written 

Pr(#e|d ) oc exp[-F(/c)]. 

In general, the posterior distribution may have a compli- 
cated shape. Nevertheless, we may approximate its shape 
near the peak k by a Gaussian of the form 



Pr(n|d ) oc exp [— |(« — k ) t H(k-k)], (26) 

where H is the Hessian matrix mentioned above. Comparing 
( pri| ) with the standard form for a multivariate Gaussian dis- 
tribution, we see that the covariance matrix of the errors on 
the k reconstruction are given approximately by the inverse 
of the Hessian matrix, i.e. 

((k — k)(k — k) T ) « H \ (27) 

where the angle brackets denote the ensemble average. The 
Hessian matrix at the peak of the posterior distribution can 
be calculated analytically and has dimension L x L, where 
L is the total number of pixels in the reconstruction. In our 
simulation L = 256, and so the numerical inversion of the 
matrix may be performed using standard techniques in just 
a few seconds of CPU time. This therefore provides a quick 
approximation to the covariance matrix of the reconstruc- 
tion errors. 

In order for the above method to be reliable, we require 
the Gaussian approximation at the peak of the posterior dis- 
tribution to be reasonably accurate. An alternative method 
for estimating the errors on the reconstruction, which does 
not rely on any approximations, is to use Monte-Carlo sim- 
ulations. By repeating the analysis of the same lensing data 
many times, but using different noise realisations, we can 
measure the covariances of the reconstruction errors numer- 
ically. However, this technique only provides an estimate of 
the ensemble average errors, rather than the errors associ- 
ated with the reconstruction obtained from particular noise 
realisation. In addition, this approach is more computation- 
ally intensive. 

6 APPLICATION TO SIMULATED 
OBSERVATIONS 

We have applied the MEM approach to the simulated ellip- 
ticity and magnification data discussed in Section |^, which 
were produced assuming the cluster convergence distribu- 
tion shown in Fig. [j| Assuming the model m in (^2|) to have 
a uniform value of 0.02 across the whole field, the result- 
ing MEM reconstruction of the convergence distribution is 
shown in Fig. Q This reconstruction converges after about 
90 iterations of the conjugate gradient algorithm to the point 
where the fractional change in F per iteration is less that 
0.1 per cent. The total CPU time required is about 2 min- 
utes on a Sparc Ultra workstation. The Bayesian value of 
the regularising parameter a that satisfies (Efj) was found 
to be a = 2.6, On performing a further 100 iterations to 
achieve an even more tightly defined convergence, the frac- 
tional change in F per iteration decreases to only 0.001 per 
cent and, more importantly, this makes no discernable differ- 
ence to the reconstruction. Similar performances were found 
for numerous other noise realisations and we are therefore 
confident that the MEM solution is extremely stable. 

Comparing the reconstruction with the true distribu- 
tion shown in Fig. we see that within the observed field 
(delineated by the dashed box), the MEM reconstruction has 
faithfully reproduced the main features. In the first layer of 
pixels outside the box some spurious features occur but, as 
discussed below, these are easily shown not to be significant. 
The algorithm succeeded in reconstructing the mass conden- 
sation to the bottom left of the observed field. This feature 
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Figure 4. The maximum-entropy reconstruction of the conver- 
gence distribution of the model cluster shown in Fig. ^| The dot- 
ted line indicates the edge of the observed field. 
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Figure 5. The rms errors in the MEM reconstruction of the 
cluster convergence, as estimated by making a Gaussian approx- 
imation to the posterior probability distribution at its peak. The 
dotted line indicates the edge of the observed field. 



in the reconstruction is in fact found to be significant, de- 
spite the fact that much of it lies outside the observing box. 
We also note that the entropic regularisation of the recon- 
struction has resulted in the outer edges defaulting to the 
model value in the absence of any data to the contrary. 

One may investigate the accuracy of the reconstruction 
and the significance of various features using either of the 
techniques discussed in Section |H| First, we use the Gaus- 
sian approximation to the posterior probability distribution 
to estimate the covariance matrix of the reconstruction er- 
rors. The square-root of the diagonal elements of this matrix 
give the rms errors in the reconstruction in each grid cell, 
and these are plotted in Fig. ^| Comparing this with the 
MEM reconstruction shown in Fig. ^ and the true conver- 
gence distribution in Fig. ^, we see that the rms errors are 
uniformly low within the observing box, with a typical value 
of about 0.05. Outside the observing box the rms errors in- 
crease, as expected. In particular, we see that rms errors 



are indeed highest in the grid cells where the reconstruction 
differs most noticeable from the true distribution. 

As a further illustration of the accuracy of the recon- 
struction, in Fig. [] we plot slices along the x and y-axes for 
the true convergence distribution (solid line) and the MEM 
reconstruction (points); we also plot the rms errors on the 
reconstruction predicted by the Gaussian approximation to 
the posterior distribution. In addition to reconstruction dis- 
cussed above, for which the level of the flat model m was set 
to 0.02, Fig. o also contains similar plots for reconstructions 
calculated from the same lensing data, but for which the 
models levels were set to 0.002 and 0.2 respectively. These 
plots show that within the observing box the level of the 
model makes virtually no difference to the reconstruction. 
Outside the observing box, we see that, as expected, the re- 
construction tends to the model value in regions where there 
is no evidence in the data to the contrary. More importantly, 
we see that in all the plots shown in Fig. |^ the difference be- 
tween the true and reconstructed distributions is consistent 
with the calculated error bars. 

As mentioned in Section ^| we may also investigate the 
accuracy of the MEM reconstruction by performing Monte- 
Carlo simluations in which the same lensing data are anal- 
ysed but for different noise realisations. In Fig. |^ we plot 
the results obtained for 100 Monte-Carlo realisations, in the 
same format as Fig. ^| The points denote the mean recon- 
structed convergence obtained in each grid cell averaged over 
the 100 realisations. The errors bars denote the standard de- 
viation of the convergence in each grid, as measured directly 
from the realisations. In regions where the convergence dis- 
tribution is well-determined by the observations, we see that 
the mean reconstructed value is very close to the true dis- 
tribution, indicating that there no bias in the MEM recon- 
struction. Moreover, in the observed field, the sizes of the 
error bars agree closely with those in Fig. [], which shows 
that the Gaussian approximation to the posterior probabil- 
ity distribution is quite accurate. In regions where the data 
do not constrain the convergence, we see that the recon- 
struction always defaults to the assumed model level and 
hence the ensemble-average error bars are very small. 

In addition we have carried out a 'noise power analysis' 
like that of Seitz & Schneider (1996) and Squires & Kaiser 
(1996). In Fig. ^ (a) we plot the azimuthal average of the 2D 
power spectrum of the residuals between the reconstructed 
field and the original field (taken within the observed field 
only), averaged over 100 noise realisations. The k = power 
point is due to the square of the mean of the residual map, 
averaged over realisations. However, apart from this point, 
the power is low for small k and increases for large k. This 
reflects the fact that the uncertainties in the mass densities 
of neighbouring pixels are strongly correlated, but the er- 
rors in widely spaced pixels are not. This is more easily seen 
in the auto correlation function of the residual map plotted 
in Fig. ^ (b). As expected, the errors in directly neighbour- 
ing pixels are slightly anti-correlated since the effect on the 
shear of adding mass density to one pixel is partially can- 
celled out by taking away mass density from a neighbouring 
pixel. However the autocorrelation function tends to zero 
as the pixel separations increases, indicating that there are 
negligible long-wavelength correlations in the errors. 

Finally, for comparison with the MEM reconstruction 
shown in Fig. ^, we calculate the maximum-likelihood (i.e. 
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Figure 6. Slices through the x and y axes of the convergence distribution. The continuous line is the true distribution and the points are 
the reconstructed values. The error bars show the standard deviation of the reconstruction errors in each cell, as estimated by making a 
Gaussian approximation to the peak of the posterior probability distribution. 
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Figure 7. Slices through the x and y axes of the convergence distribution. The continuous line is the true distribution and the points 
are the mean values of the convergence in each cell averaged over 100 noise realisations. The error bars show the standard deviations of 
the convergence in each cell calculated directly from the 100 reconstructions using different noise realisations. 
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Figure 8. The azimuthally averaged power spectrum (a) and autocorrelation function (b) of the residuals (reconstructed convergence 
minus true convergence) in the observed field averaged over 100 noise realisations. The error bars show the 1-sigma uncertainties in 
each point as derived from the 100 realisations. The plotted power spectrum is the azimuthal average of the 2D power spectrum of the 
residuals map. The normalisation was taken such that the sum of the points in the 2D power spectrum equalled the mean squared value 
of the residuals map. The normalisation of the autocorrelation function was fixed so that it equalled unity at zero separation. 



a = 0) reconstruction obtained using the same lensing data. 
The resulting reconstruction is shown in Fig. after 160 
iterations of the conjugate gradient minimisation algorithm 
at which point the fractional change in F per iteration is 0.1 
per cent. The reconstruction is plotted on the same greyscale 
as Figs |^ and ^, in order to illustrate that in the observed 
field the result is similar to the maximum-entropy recon- 
struction. Outside the observed field, however, there are nu- 
merous spurious features in which the convergence rises to 
unrealistically large values, with a peak value of about 1.3. 
Also the condensation of mass to the bottom left of the 
field is not faithfully reconstructed outside the observed box. 
Moreover, on iterating further, the peak value of the recon- 
structed convergence distribution continues to rise, and after 
800 iterations the fractional change in F per iteration is less 
than 0.001 per cent and the peak convergence is 2.3. Af- 
ter an additional 800 iterations the peak convergence rises 
to 4.0, although the fractional change per iteration in the 
value of F becomes extremely small. For other noise realisa- 
tions, we sometimes found the maximum-likelihood solution 
to be even less stable and, in some cases, the peak value 
of the reconstructed convergence distribution continued to 
rise steadily beyond 1500 iterations to values exceeding 7.0. 
Thus, outside the observing box we find that the maximum- 
likelihood reconstruction does not, in general, converge to 
a numerically stable solution. As discussed in Section W, we 
further note that the maximisation of the likelihood function 
is performed with respect to log«;(j). Thus, the algorithm 
used to calculate the maximum-likelihood reconstruction al- 
ready enforces positivity and may therefore be compared di- 
rectly with more sophisticated reconstruction methods such 
as the Lucy-Richardson algorithm. We would therefore ex- 
pect such methods also to suffer from similar numerical in- 
stabilities. 



7 CONCLUSIONS 

We have presented a new maximum-entropy method for re- 
constructing the projected mass distribution galaxy cluster 
from observations of its gravitational lensing effects. The 
method is ideally suited to the reconstruction of clusters for 
which the observing field is small or irregularly-shaped and 





-400 -200 200 400 

Relative RA (arcsec) 

Figure 9. The maximum likelihood reconstruction of the model 
cluster convergence distribution. The dotted line indicates the 
edge of the observed field. 



does not suffer from unwanted boundary effects that affect 
the reconstructions obtained using the traditional Kaiser & 
Squires algorithm. We find that for realistic levels of uncer- 
tainty in the observed shear and magnification patterns, the 
technique faithfully reproduces the cluster mass distribution 
within the observed field and, moreover, yields a reasonable 
reconstruction some distance beyond the observing box. We 
also show that the errors on the reconstruction can be re- 
liably estimated by making a Gaussian approximation to 
the posterior probability distribution at its peak. In regions 
where the cluster mass distribution is well constrained by the 
lensing observation, we find that the estimated reconstruc- 
tion errors agree well with those obtained from Monte-Carlo 
simulations. 
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